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Abstract 

The main goal of this paper is to apply the so-called policy iteration algorithm (PIA) for the 
long run average continuous control problem of piecewise deterministic Markov processes (PDMP's) 
taking values in a general Borel space and with compact action space depending on the state 
variable. In order to do that we first derive some important properties for a pseudo-Poisson equation 
associated to the problem. In the sequence it is shown that the convergence of the PIA to a solution 
satisfying the optimality equation holds under some classical hypotheses and that this optimal 
solution yields to an optimal control strategy for the average control problem for the continuous- 
time PDMP in a feedback form. 
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1 Introuction 



This paper studies the policy iteration algorithm (PIA) for the average cost control problem of a class 
of continuous-time Markov processes, namely piecewise-deterministic Markov processes (PDMP's). 
These processes have been introduced in the literature by M.H.A. Davis [3j as a general class of 
stochastic models. They are a family of Markov processes involving deterministic motion punctuated 
by random jumps. The motion of the PDMP {X(t)} depends on three local characteristics, namely 
the flow (ft, the jump rate A and the transition measure Q, which specifies the post-jump location. 
Starting from x the motion of the process follows the flow (ft{x,t) until the first jump time T\ which 
occurs either spontaneously in a Poisson-like fashion with rate \((ft(x,t)) or when the flow (ft(x,t) hits 
the boundary of the state-space. In either case the location Z\ of the process at the jump time T\ is 
selected by the transition measure Q{(ft{x,T\), .). Starting from Z\, we now select the next interjump 
time T2 — T\ and postjump location X(T<2) = Zi in a similar way. This gives a piecewise deterministic 
trajectory for {X(t)} with jump times {Tk} and postjump locations {Z^}, and which follows the flow 
(ft between two jumps. A suitable choice of the state space and the local characteristics (ft, A, and Q 
provide stochastic models covering a great number of problems of operations research [7|. 

The present work is a continuation of a series of papers: It deals with the long run average 

cost control problem of PDMP's taking values in a general Borel space. At each point x of the state 
space a control variable is chosen from a compact action set U(x) and is applied on the jump parameter 
A and transition measure Q. The long run average cost is composed of a running cost and a boundary 
cost (which is added each time the PDMP touches the boundary). In this context, we follow the 
idea developed in [1, [H] consisting of writing the optimality equation for the long run average cost 
control problem of the PDMP {X(t)} in terms of a discrete-time optimality equation related to the 
embedded Markov chain given by the post-jump location of the process {X(t)}. As pointed out in 
[1] , this discrete-time optimality equation is different from those classical ones encountered within the 
context of discrete-time Markov decision processes. The two main reasons for doing that is to use the 
powerful tools developed in the discrete-time framework (see for example the references 0, Is, U, 13]) 
and to avoid working with the infinitesimal generator associated to a PDMP, which in most cases has 
its domain of definition difficult to be characterized. 

The PIA has received considerable attention in the literature and consists of three steps: initializa- 
tion, policy evaluation, which is related to the Poisson equation (PE) associated to the transition law 
defining the Markov decision process, and policy improvement. Without attemptingtopresent here 
an exhaustive panorama of the literature for the PIA, we can mention the surveys [l|, y, LLJ, LL3|, LL6] 



and the references therein and more specifically the references [12|, [15[ that analyze in details the PIA 
for general Markov decision processes and provide conditions which guarantee its converge. 

The paper is organized as follows. We shall formulate in section [2] the control problem while in 
section [3] some of the main assumptions are presented. In our context, the policy evaluation step is 
connected to a kind of PE which we call a pseudo-Poisson equation. This equation is clearly different 
from a classical PE encountered in the literature of the discrete-time Markov control processes, see 
Remark 14.21 However, although different, we can show in section 0] that this pseudo-Poisson equation 
still has the good properties that we might expect to satisfy in order to guarantee the convergence 
of the policy iteration algorithm. These results are not straightforward to obtain due to the specific 
structure of this discrete-time optimality equation. Finally in section [5l the PIA is studied in details. 
It is first shown that the convergence of the PIA to a solution satisfying the optimality equation holds 
under some classical hypotheses. In the sequence it is shown that this optimal solution yields to an 
optimal control strategy for the average control problem for the continuous-time PDMP in a feedback 
form. 
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2 Definitions and problem formulation 



2.1 Presentation of the control problem 

In this section we present some standard notation and some basic definitions related to the motion 
of a PDMP {X(t)}, and the control problems we will consider throughout the paper. For further 
details and properties the reader is referred to 0]. The following notation will be used in this paper: 
N denotes the set of natural numbers, R the set of real numbers, R+ the set of positive real numbers 
and R rf the d-dimensional euclidian space. We write r\ as the Lebesgue measure on R. For X a metric 
space B{X) represents the u-algebra generated by the open sets of X. M(X) (respectively, V(X)) 
denotes the set of all finite (respectively probability) measures on (X,B(X)). Let X and Y be metric 
spaces. The set of all Borel measurable (respectively bounded) functions from X into Y is denoted 
by M.(X;Y) (respectively M(X;Y)). Moreover, for notational simplicity M(X) (respectively M(X), 
M(X)+, JB(X)+) denotes Mpf;R) (respectively B(X;R), M(X;R+), JB(X;R + )). For g G M(X) with 

(x) I 

g(x) > for all x G X, M g (X) is the set of functions v G M(X) such that ||?;(x)|| ff = sup — t— r- < +oo. 

x&x 9{x) 

C(X) denotes the set of continuous functions from X to R. For h G M.(E), h + (respectively h~) 
denotes the positive (respectively, negtive) part of h. 

Let E be an open subset of R n , dE its boundary, and E its closure. A controlled PDMP is determined 
by its local characteristics (<p,X,Q), as presented in the sequel. The flow <fr(x,t) is a function eft : 
R n x R_|_ — ► 1" continuous in (x, t) and such that 4>(x, t + s) = 0((/>(x, t),s). For each x G E the time 
the flow takes to reach the boundary starting from x is defined as t*(x) = inf{£ > : 4>(x,t) G dE}. 
For x G E such that t*(x) = oo (that is, the flow starting from x never touches the boundary), we set 
4>{x,t*{x)) = A, where A is a fixed point in dE. We define the following space of functions absolutely 
continuous along the flow with limit towards the boundary: 

M ac (E) = {g G M(E) : g(<p(x,t)) : [0,t*(x)) i-> R is absolutely continuous for each x <E E 

and whenever t*(x) < oo the limit lim g(<p(x,t)) exists}. 

t—*t*(x) 

For g G Wl ac {E) and z G dE for which there exists x G E such that z = (fi(x,t*(x)) where t*(x) < oo 
we define g(z) = lim g(<p(x,t)) (note that the limit exists by assumption). As shown in Lemma 2 

t—>t*(x) 

in 0], for g G M ac (E) there exists a function Xg G M(E) such that for all x G E and t G [0,t*(x)) 
g(4>(x,t)) -g(x) = f Xg(c/)(x,s))ds. 

The local characteristics A and Q depend on a control action u G U where U is a compact metric space 
(there is no loss of generality in assuming this property for U, see Remark 2.8 in 0]), in the following 
way: A G M.(E x U) + and Q is a stochastic kernel on E given E x U. For each x G E we define the 
subsets U(x) of U as the set of feasible control actions that can be taken when the state process is 
x G E, that is, the control action that will be applied to A and Q must belong to U(x). The following 
assumptions, based on the standard theory of Markov decision processes (see for example [HI]), will 
be made throughout the paper: 

Assumption 2.1 For all x G E, U(x) is a compact subspace o/U. 

Assumption 2.2 The set K = |(x,a) : x G E,ae U(x)} is a Borel subset of E x U. 

We present next the definition of an admissible control strategy and the associated motion of the 
controlled process. A control policy U is a pair of functions (u, ug) G M(N x Ex R + ; U) x M(N x E; U) 
satisfying u(n, x, t) G U(0(x, t)), and ug(n, x) G U((f>(x, i*(x))) for all (n,x,t) GffxBx R + . The class 
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of admissible control strategies will be denoted by U. Consider the state space E = E x E x M + x N. 
For a control policy U = (u, ug) let us introduce the following parameters for x = (x, z, s, n) G E: the 
flow 4>(x, t) = {4>{x, t), z,s + t, n), the jump rate X u (x) = X(x, u(n, z, s)), and the transition measure 



Q u (x,A x B x {0} x {n + 1}) 



Q(x,u(n,z,s));AnB) if x G E, 
Q(x,u d (n,z);AnB) if x G dE, 



for A and B in B(E). From 0, section 25], it can be shown that for any control strategy U = (u, ug) G U 
there exists a filtered probability space (Q, F, {Ft}, such that the piecewise deterministic 

Markov process {X u (t)} with local characteristics ((/>, X u , Q u ) may be constructed as follows. For 
notational simplicity the probability Pj^ will be denoted by PV k ^ for xq = (x, x, 0, k) G i?. Take a 
random variable Xi such that 



P( x ,k)(Tx>t) 



' e -A u (x,k,t) for f < 
for i > 



where for and i G [0, i*(x)[, A u (x, k, t) = J \(4>(x, s),u(k, x, s))ds. If T\ is equal to infinity, then 

for t G K+, X u (t) = (cj)(x,t),x,t,k) . Otherwise select independently an ^-valued random variable 
(labelled A^ 7 ) having distribution 



_ fQtefaT^ufcztTxViAnB) if 0(x, Ti) € E, 
The trajectory of {A 17 ^)} starting from (x, x, 0, k), for t < Ti , is given by 



PF t ^(XV G A x 5 x {0} x {k + l}|ff{2i}) - s 

1 W 1 \g( ( /)(2;,T 1 ),Ma(fc,x);^n5) if 0(x, Ti) G (?T7. 



((/>(x, i), x, t, fc) for t < T\, 
Xf for t = Ti. 



Starting from X^Ti) = Af, we now select the next inter-jump time T<i — T\ and post-jump location 
X (T2) = X2 in a similar way. Let us define the components of the PDMP {X u (t)} by 

X u (t) = (X(t),Z(t),r(t),N(t)). (1) 

For notational convenience, we have omitted to write explicitly the dependence of U on the com- 
ponents: X(t), Z(t), r(t) and N(t). From the previous construction, it is easy to see that X(t) 
corresponds to the trajectory of the system, Z(t) is the value of X(t) at the last jump time before t, 
r(t) is the time elapsed from the last jump up to time t, and N(t) is the number of jumps of the process 
{X(t)} up to time t. As in Davis [7j, we consider the following assumption to avoid any accumulation 
point of the jump times: 



Assumption 2.3 For any x G E, U £U, and t > 0, we have e¥ 



(x,0) 



i=l 



< OO. 



Remark 2.4 In particular, a consequence of Assumption \2.3\ is that T m — > 00 as m — > 00 PR ^ /or 
all x £ E, U G 

The costs of our control problem will contain two terms, a running cost / and a boundary cost r, 
satisfying the following properties: 
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Assumption 2.5 / G M(E x U)+, and r G M(dE x U)+. 



Define for a > 0, t G M+, and f/eW, 

J Q (l^) = f e- as f{X(s),u(N(s),Z(s),r(s)))d S + f e- as r{X(s-),u d (N(s-), Z{s-)))dp* (s), 
Jo Jo 



where p*(t) = ' s ^I{T i <t}I{x(T % -)&dE} counts the number of times the process hits the boundary up 



to time t and, for notational simplicity, set J(U,t) = J°(U,t). The long-run average cost we want to 
minimize over U is given by: A(U, x) = ^lim — .E^ )[J(t7, t)]. We need the following assumption, to 
avoid infinite costs for the discounted case, see [J]. 

Assumption 2.6 For all a > and all x G E, inf EV. n JJ a (i7, oo)l < oo. 
2.2 Discrete-time relaxed and ordinary controls 

We present in this sub-section the set of discrete-time relaxed and ordinary controls. Consider C(U) 
equipped with the topology of uniform convergence and Ai(U) equipped with the weak* topology 
ct(A4(U),C(U)). For x G E, define V X (U) as the set of measures fi G V(V) satisfying fj,(U((f>(x,U (x)))) = 
1. V(U) and V X (V) for x G E are subsets of M.(U) and are equipped with the relative topology. 

Let V r (respectively V r (x) for x G E) be the set of all r\- measurable functions \x defined on M + with 
value in "P(U) such that n(t,\J) = 1 ry-a.e. (respectively /i(t, U(4>(x, t))) = 1 ry-a.e.). It can be shown 
(see sub-section 3.1 in [4j) that V r (x) is a compact set of the metric space V r : a sequence (/^n) neN in 
V r (x) converges to p if and only if for all g G L 1 (M + ; C(U)) 



lim / / g(t,u)[i n (t,du)dt = / / g(t,u)/j,(t,du)dt. 

n ^°° Jr + Jv(<f>(x,t)) JR+ JV(<j>(x,t)) 

The sets of relaxed controls can be defined as follows: Y r (x) = V r (x) x V x (U), for x G E and V r = 
V r xV (U) . The set of ordinary controls, denoted by V (respectively V(x) for x G E), is defined as above 
except that it is composed of deterministic functions instead of probability measures. More specifically 
we have V{x) = {v G M(R+,U) : (Vt G R+),i/(t) G U(<£(x, *))} , V(x) = V(x) x U(^(x, t*(x))), 

V = M(K+, U) x U. Consequently, the set of ordinary controls is a subset of the set of relaxed controls 

V (respectively V r (x) for x G E) by identifying any control action u G U with the Dirac measure 
concentrated on u. Thus we can write that V C V r (respectively V(x) C V r (x) for i£E) and from 
now on we will consider that V (respectively V(x) for x G E) will be endowed with the topology 
generated by V r . The necessity to introduce the class of relaxed control V r is justified by the fact that 
in general there does not exist a topology for which V and V(x) are compact sets. 

As in [ll|], page 14, we need that the set of feasible state/relaxed-control pairs is a measurable subset 
of B(E) x £>(V r ), that is, we need the following assumption. 

Assumption 2.7 K = {(x,9) : 6 G V r (x),x G E) G B{E) x B(Y r ). 

A sufficient condition is presented in [4j, Proposition 3.3] to ensure that Assumption 12.71 holds. 
2.3 Discrete-time operators and measurability properties 

In this sub-section we present some important operators associated to the optimality equation of the 

discrete-time problem. We consider the following notation w(x,n) = / w(x,u)fi(du) and Qh(x,fj,) = 

Jv 
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h(z)Q(x,u;dz)p,(du), and XQh(x, p) = / X(x,u) / h(z)Q(x,u;dz)p,(du) for x G E, p G 'P(U), 

/u .//•; " ./ .• .//•; 

/i G M(£) + and iu G M(S x U)+. 

The following operators will be associated to the optimality equations of the discrete-time problems 
that will be presented in the next sections. For 9 = (//, pg) G V r , (x, E £x B{E), according 
to Lemma 2 in [8, Appendix 5] define 

A"(s,t) = f \(<f>(y,s),p(s))ds 



o 



G Q (x,G;A) = / e- as - A ^ x ^XQI A {Hx,s),Ks))ds 



o 



+e _oM*)-A'*OM.(*))Q(0( iB) t *(x)),p, d ;A). (2) 

For /i G M(E)+, we define G a h(x,@) = / h(y)G a (x,@;dy). For x e E, 6 = (m,Ms) G V r , u G 
M(E x U) + , to G M(&E x U) + , oel, introduce 

ft* (#) 

L Q w(x,G) = / e- as - A ^ x ' s) v((j)(x,s),ij(s))ds, (3) 



^(x,9) = e-^M- A ^ x > u Ww{<f>(xMx)),W>)- (4) 

For /i G M(E) (respectively, v G M(E x U)), G a h(x, 9) = G a h + (x, 9) - G a h~(x, 9) (respectively, 
L a v(x, 9) = L a v + (x, 9) — L a v~(x, 9)) provided the difference has a meaning. It will be useful in the 
sequel to define the function C a (x, 0) as follows: £ a (x,9) = L a lExu(x, 9). In particular for a = 
we write for simplicity Go = G, Lo = -£Ao = H, Co = C. Measurability properties of the operators 
Gq , L a , and JL a are shown in [J, Proposition 3.4]. 

We present now the definitions of the one-stage optimization operators. 

Definition 2.8 Let a G R+, p G R, and h G M(E). Assume that for any x G E and T G V(x), 
—pC a (x, T)+L a f(x, T)+H a r(x, T)+G a h(x, T) is tueZZ defined. The (ordinary) one-stage optimization 
operator is defined by 

T a (p,h)(x) = inf \-pC a (x,T) + L a f(x,T) + H a r(x,T) + G a h(x,T)\. 

rev(x) I ) 

Assume that for any x G E and 9 G V(x), —pC a (x, 9) + L a f(x, 9) + H a r(x, 9) + G a h(x, 9) is well 
defined. The relaxed one-stage optimization operator is defined by 

K a (p,h)(x)= inf \-pC a (x,Q)+L a f(x,Q) + H a r(x,e) + G a h(x,Q)}. 

eeV r (x) { > 

In particular for a = we write for simplicity Tq = T, and IZo = 1Z. 

The sets of measurable selectors associated to (V(x)) E , (Y(x)) x E , {^ r { x )) x€E are defined by S\j = 
{u G M(E,V) : (Vx G E),u(x) G U(x)}, 5 V = G M(E,V) : (Vx G £), (i/(x), z/ S (x)) G V(x)}, 

<V = {(/i, Me) G M(£,V) : (Vx G E), (p(x), p d {x)) G V r (x)}. 

For a G K+, p G M, and v G M(i£), the one-stage optimization problem associated to the operator 
T a (p,v), respectively TZ a (p,v), consists of finding a measurable selector T G <SV, respectively 9 G <Sy 
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such that for all x G E, T a (p, v)(x) = —pC a (x, T)+L a f(x, Y)+H a r(x, T)+G a v(x, T) and respectively 



Finally we conclude this section by recalling (see Propositions 3.8 and 3.10 in |4{|) that there exist two 
natural mappings from Su to Sy and from S\j to U. 

Definition 2.9 For u G <5>u? define the measurable mapping of the space E into V by 
u<j, : x -> (u(4>{x, .)),u(4>(x,U(x)))) . 

Definition 2.10 For u G Sn, define the measurable mapping U u , of the space NxBx M + into DxU 
by U u , : (n,x,t) — > (u((f)(x, t)), u(4>(x, o/ i/te space N x E 1 x M + into DxU. 

Remark 2.11 T/ie measurable selectors of the kindu^ as in Definition \2. 9\ are called ordinary feedback 
measurable selectors in the class Sy C Syr and the control strategies of the kind XJ U , as in definition 
\2.10\ are called ordinary feedback control strategies in the class U. 

3 Assumptions 

In order to prove our main results presented in section we need to impose some conditions. As- 
sumptions EUl [32] and EDS are needed to guarantee some convergence and continuity properties of the 
one-stage optimization operators, and the existence of a measurable selector. These properties are 
important to ensure the convergence of the policy iteration algorithm as shown in section 15.11 

Assumption 3.1 For each x G E, the restriction of X(x,.) to U(x) is continuous, fort G [0,t*(o;)), 
rt rU(x) 

/ sup \(4>{x , s) , a) ds < oo and ift*(x) < oo then / sup \((j)(x , s) , a) ds < oo. 

JO aeV((j)(x,s)) JO aeU(^(x,s)) 

Assumption 3.2 For all y G E, the restriction of f(y, .) to U(y) is continuous and for all z G dE, 

the restriction ofr(z,.) to V(z) is continuous. 

Assumption 3.3 For all x G E and h G M(E), the restriction of Qh(x, .) to V(x) is continuous. 

The next assumption is mainly used to show that the policy iteration algorithm converges to the 
optimal cost and gives an optimal feedback control as shown in section 15.21 This condition is somehow 
related to the so-called expected growth condition (see, for instance, Assumption 3.1 in [10] for the 
discrete-time case, or Assumption A in [9] for the continuous-time case). 

Assumption 3.4 Suppose that there exist b > 0, c > 0, 5 > 0, M > and g G M ac (£') ; g > 1 
r G M(dE), r(z) > 0, satisfying for all x G E 



Tl a (p,v)(x) 



pC a (x, Q) + L a f(x, G) + H a r(x, 9) + G a v(x, 6). 




(5) 



(6) 



and for all x G E with t*(x) < oo 



sup 

a&S{<j>(x,U (x))) 



{r{4>(x,U(x))) + Qg{4>(x,U(x)),a)} < g((/>(x,t*(x))) 



(7) 
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In the next assumption notice that for any u G <Su, G{x,u$\ .) can be seen as the stochastic kernel 
associated to the post-jump location of a PDMP. This assumption is related to some geometric ergodic 
properties of the operator G (see for example the comments on page 122 in [l3| or Lemma 3.3 in 
for more details on this kind of assumption). 

Assumption 3.5 There exist a > 0, < k < 1 and for any u G Su there exists a probability measure 
v u , such that v u (g) < +oo and 

\G k h{x, U(t> ) - u u (h)\ < a\\h\\ g K k g(x), (9) 

for all h G M g (E) and fcsN. 



The following hypothesis is given by a Lyapunov-like inequality yielding an expected growth condition 
on the function q with respect ot G (for further comments on this kind of assumption, see for example 
section 10.2 in 0, page 121]). 

Assumption 3.6 There exist < k g < 1 and K g > such that for all x G E, V G V(x), 

Gg(x,T) < k g g(x) + K g . (10) 



The final assumption is: 

Assumption 3.7 There exist A G M(£') + ; and K\ G K+ such that 

a) X(y, a) > X(y) for all y G E and a G U(y), 

b) / e c *-Jo mx,s))ds dt < K ^ for aU xeE ^ 
Jo 

c) lim e ^-IoU<t>M)ds = Q> for a u xeE with t r x \ = +00; 

d) lim e-R&K x ^g($(x,t)) = 0, for all x G E with Ux) = oo, 

e) / e ~S hk4>{x,s))ds gup f(4( Xi t),a)dt <oo. 



Remark 3.8 Notice the following consequences of Assumption 3 



i) Assumption \3.7\ c) implies that G a (x,Q;A) = / e~ as ~ AI *( x ' s ^ XQIa{4>(x, s), fj,(s))ds, and 

Jo 

H a w(x,Q) = 0, for any x G E with t*(x) = +oo, A G B(E), a > —c, = {11,119) G Y r (x), 
w G M(dE x U). 

ii) Assumptions \3. 7| a/ 1 and fry 1 imply that jC a (x, 0) < K\ for any a > —c, x G E, G V r (x). 
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4 A pseudo-Poisson equation 



We introduce in Definition l4.1l a pseudo-Poisson equation associated to the stochastic kernel G. Propo- 
sition (331 shows that there exists a solution for such an equation. Moreover, it is proved in Proposition 
14.41 that this equation has the important characteristic of ensuring the policy improvement property 
in the set <Su- 

Definition 4.1 Consider u G S\j. A pair (p,h) G R x M g (E) is said to satisfies the pseudo-Poisson 
equation associated to u if 

h(x) = -pC(x, u$(x)) + Lf(x, u^x)) + Hr(x, u^x)) + Gh(x, u^x)). (11) 

Remark 4.2 This equation is clearly different from a classical Poisson equation encountered in the 
literature of the discrete-time Markov control processes see for example equation (2.13) in [12]. In 
particular, the constant p, that will be shown to be the optimal cost, appears here as a multiplicative 
factor of the mapping C(x,u^(x)) and the costs f and r appear through the terms Lf(x,u^(x)) and 
Hr(x, u ( p(x)). However, it will be shown in the following propositions that this pseudo-Poisson equation 
has still the good properties that we might expect to satisfy in order to guarantee the convergence of 
the policy iteration algorithm. 

Proposition 4.3 For arbitrary u G Sn the following assertions hold: 

(a) SetD u = / C(y,u<j,(y))v u (dy). Then < D u < K\. 

Je 

(b) If v G M g (E) and b G R are such that for all x G E, 

v(x) = b£(x, tt^(x)) + Gv(x, u^x)) (12) 
then 6 = and for some cq G R, v(x) = cq for all x G E. 

(c) Let w u be the mapping in M.(E) defined by w u (x) = Lf(x,u ( p(x)) + Hr(x,u ( j ) (x)) — p u C(x,u ( f ) (x)) 
for x G E. Define (p u , h u ) by 

I [Lf(y, u<f>(y)) + Hr(y, u^y))] v u (dy) 
Pu = ^ =z > 0, (13) 

oo 

hu(x) = G k w u {x, u^x)). (14) 

k=0 

Then (p u ,h u ) G R x M g (E) and it is the unique solution to the Poisson equation ill]) associated 
to u that satisfies 

u u {h u ) = 0. (15) 

Moreover 

\\h u \\g < , with M u := max <p u Kx, >■ (16) 
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Proof: Item (a) is straightforward since < C(x,u^,(x)) < K\ for all x € E (see Remark 13.81 ii)). 

For (b) let us suppose that b > 0. Since < C(x,u ( f ) (x)) for all x 6 E it follows from (fT2l) that 
t>(x) > Gv(x,u ( p(x)) for all x G E and from Lemma 4.1 (a) in 12J, = cq z/ u -a.s. for some Co € R. 
Returning to (|12p and integrating with respect to v u we have that = bD u and so b = 0. Therefore 
from (|12p . v(x) = Gv(x,u ( f,(x))^_that is, u is an ^-harmonic function and therefore v(x) = c$ for all 
x S E (see Lemma 4.1 (a) in |12j|). If b < then from (|12p it follows that v(x) < Gv(x,u ( j ) (x)) for 



all x £ E and from Lemma 4.1 (a) in 12|, v(x) = cq z^-a.s. for some cq € R. Returning to (|12p and 



integrating with respect to z^ u we have that = bD u and since D u > 0, we have a contradiction. 

For (c) we first note that from Proposition 3.12 in [5], < Lf(x, u ( f,(x))+Hr(x, u^x)) < M ^ 1+ c bKx ^ g(x) 

so that clearly / \Lf(y, u^y)) + Hr(y, u^{y))\ v u {dy) < +oo, and thus (fT3|) is well defined. Moreover 
Je 

< p u C{x,u^{x)) < p u K\ and thus w u € M g (E) with ||u? u || fl < M u where M u is defined in (fl~6j) . We 
also have from (1131) that 



w u {y)v u {dy) = \ [Lf(y, u^y)) + #r(y, n^(y))] u u (dy) - p u D u 
e Je 

= (17) 

and thus, from (|9|), 

\G k w u (x,u<p(x)) \ = \G k w u (x,u^(x)) - v u (w u )\ < aM u K k g(x), (18) 
for all x e E and k G N. From (HID and (fTSD it is clear that 



Ms) | < T^-fa), (19) 

1 ' 1 — « 



showing that /i u is in B ff (£') and satisfies (|16p . We also have from (114[) that 

oo 

h u (x) - w u (x) = G k w u (x, Utj,(x)) = G u h u (x, u^x)) 



k=l 



showing that (p u ,h u ) € R x B s (£') satisfies (jTTjl . 

If (pi, /ti) 6 R x B g (i£), i = 1,2 are 2 solutions to the Poisson equation (fTTj) then setting v = h\ — h>2 
and b = p2 — pi we get that (fT2j) is satisfied and uniqueness follows from (b). □ 

From now on, (p u ,h u ) will denote the unique solution of the pseudo-Poisson equation (llip that 
satisfies v u {h u ) = 0. 

The properties given in the following proposition are important for showing the convergence of the 
PIA. 

Proposition 4.4 Consider u E Sq. Then there exists u € S\j such that 

U{p u ,h u ){x) = -p u C{x,u^(x)) + Lf(x 1 u^{x)) + Hr(x,u^{x)) + Gh u (x,u lj) {x)), (20) 
and Pu< Pu- 
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Proof: From Theorem 3.22 in 0] we have that there exists u G Su such that (|2U|) holds. Clearly we 
have for every x G E that h u (x) > 7Z(p u ,h u )(x), that is, from (I20p . 

^u(s) > -p u C{x,u^(x)) + Lf(x,u<p(x)) + Hr(x,u^(x)) + Gh u (x, u^(x)). 

Integrating the previous equation with respect to and recalling that the definition of D u (see item 
a) in Proposition I4.3j) and / Gh u (y,u ( p(y))i'q(dy) = / h u {y)vu{dy) , we get that 

K(y)vu(dy) > -p u Du + PuDu+ / h u (y)is%(dy) 

E JE 

that is, /? u -Dm > and since D5; > we get that p u > pq. □ 

5 The Policy Iteration Algorithm 

Having studied the pseudo-Poisson equation defined in section HJ we are now in position to analyze the 
policy iteration algorithm. In the first part, it is shown that the convergence of the policy iteration 
algo rithm holds under a classical hypothesis (see for example assumption (HI) of Theorem 4.3 in 
[l2|). Roughly speaking, it means that if the PIA computes a solution (p n , h n ) at the nth step then 
(p n ,h n ) — > (p,h) and (p, h) satisfies the optimality equation (|24|) . However it is far from obvious 
to claim that p is actually the optimal cost for the long run average cost problem of the PDMP 
{X(t)} and that there exists an optimal control. In the second part of this section, these two issues 
are studied. In particular, we show that p = inf^ A(U, x) and the measurable selector of the 

optimality equation (|24p provides an optimal control of the feedback form Uq. for the process {X(t)}: 
M i A{U,x) = A(U^,x). 

The policy iteration algorithm performs the following steps: 
Step 1: Initialize with an arbitrary uq G «Su, and set n = 0. 

Step 2: Policy Evaluation - At the n 4/l -iteration consider u n G S\\ and evaluate (p n , h n ) 6lx M g (E) the 
(unique) solution of the Poisson equation (fTTj) . (fT5|) given by (fT3j) and (fT4"l) . replacing u by u n , 
thus we have that 

h n (x) = -p n C(x, (u n )^{x)) + Lf(x, (n n )^(x)) + Hr(x, (u n )^(x)) + Gh n (x, {u n )^{x)), (21) 

with v Uri {h n ) = 0. 

Step 3: Policy Improvement - Determine u n+ \ G S\j such that 

K(Pn, h n ){x) = -p n £(x, (un+i)^)) + Lf(x, (tt n+1 ) < ^(a;)) + i?r(a?, (u n+ i)^(a;)) 

+G7i n (x,(u rH -i)*(aO). (22) 



Notice that from Propositions 14.31 and 14.41 the sequence (p n ,h n ) G K x B g (.E) and u n G 5u is well 
defined and moreover, p n > Pn+i ^ 0. We set p — lim n _ >00 /3 n . 
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5.1 Convergence of the PI A 

First we present in the next result some convergence properties of G, H, L and C. 

Proposition 5.1 Consider h € M g (E) and a sequence of functions (/ifc) fcgN £ M g (E) such that for all 
x G E, lim hf.(x) = h{x) and there exists Kh satisfying /ifc(x) < Ki l g(x) for all k and all x £ E. 

k— >oo 

For x G E, consider Q n = (fi n , fJ-d,n) € V r (x) and Q = (fj,,fj,g) G V r (x) smc/i i/iai Q n — > G. VKe /wrae 
the following results: 

a) lim C(x, G n ) = G), b) lim L/(x, G n ) = Lf(x, G), 

n— »oo n— +oo 

c) lim Hr{x, Q n ) = Hr{x, Q), d) lim G7t n (z, G n ) = Gh(x, Q). 

Proof: The proof of item a) is the same as in Proposition 5.7 in [J] and it is essentially based on the 
fact that lim A fJ ' n (x,t) = A fl (x,t) by using assumption 13.11 

n— >oo 

Item b) We have for x 6 E, 

-U(x) 



Lf(x,Q 



J o 







By combining items a) and e) of assumption 13.71 and the dominated convergence theorem we obtain 

lim /** * | e -A"»(*,t) _ e -^,*)\fUf xt ),^(t))dt = o. 
n-»oo J Q 

Therefore, we obtain item b) by using assumption 13.21 

Item c) Let us consider first that t*(x) = oo. From item i) of remark \3 .81 it follows that Hr(x, Q n ) = 
Hr(x, Q) = 0. Suppose now that t#(x) < oo and set z = 4>(x,t*(x)). From assumption 13.21 it follows 
that lim r(z,Hd n ) = r(z,fj,g) showing item c). 

Item d) Let {at} a non increasing sequence of positive numbers with ctk J, 0. We have clearly 
lim Gh n (x,@ n ) > lim G an h n (x,Q). It follows that lim Gh n (x,Q n ) > Gh(x,Q) by applying Propo- 

n— >oo n— >oo n^oo 

sition 3.18 in [5j. Replacing h n by — h n it gives that lim Gh n (x, Q n ) < Gh(x, G), completing the proof 

n^oo 

of item d). □ 
We shall consider now the following assumption. 

Assumption 5.2 There exists a subsequence {h^} of {h n } and h € W1(E) such that for each x £ E, 

lim h k (x) = h(x). (23) 

fc— >oo 

The following theorem is the main result of this subsection. It shows the convergence of the PIA and 
ensures the existence of a measurable selector for the optimality equation. 
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Theorem 5.3 We have that (p,h) 6Rx B 9 (£') satisfies the optimality equation: 

h(x)=H{p,h)(x). (24) 

Moreover there exists u G Sn such that 

h(x) = -pC(x, u^(x)) + Lf(x, u^(x)) + Hr(x, u<j,(x)) + Gh(x, u<t>{x)). (25) 
Proof: From (|16p and recalling that p n > p n +\ we get that for all k, 

\\h k \\ 9 <M:= a -^, M UQ :=^ { p K x M{1 + bK ^ }. (26) 

1 K C 

From (j26j) we get that /i € M g (E), where h is as in (|23j) . Consider u& G <5>u the measurable selector 
associated to (pk,hf.) as in (f2Tj) We have that for each x € E, Y r (x) is compact and {(uk)^} is a 
sequence in Syr. Then according to Proposition 8.3 in [13] (see also [13]) there exists € Sy such 
that Q(x) € V r (x) is an accumulation point of {(wfc)</,(x)} for each x E E. Therefore for every x E. E, 
there exists a subsequence ki = ki(x) such that Umj_ voo (nfc i )^(a;) = 6(x). We fix now x E E and we 
consider the sub (x) as above. From Proposition 15.11 and taking the limit in (|21|) for 

n = ki as i — > oo we have that 

/i(x) = -p£(z, 0(x)) + I/(z, 0(x)) + ffr(x, 6(z)) + G/i(x, 0(x)), (27) 

and thus clearly /i(x) > 7£(/0, h)(x). On the other hand from (I21|) and (|22|) we have that 

TZ(p n -l, h n -i)(x) + {pn-l ~ Pn)£(x, {u n )^{x)) + G(/l n - h n -l)(x, {Un)^{x)) 

= -p n C(x, (u n )<p(x)) + Lf(x, (u n )^(x)) + Hr(x, (u^^x)) + Gh n (x, (u n )<p(x)) 
= K(x). (28) 



From (|28p it is immediate that for any £ «Sy 

M*) < -p n -i>C(x ; 0(x)) + L/(x, 0(s)) + ffr(x, G(x)) + G/i n _i(z, G(x)) 

+ (/? n _i - p n )£(x, (un)^(x)) + G(/i n - /i n _i)(x, (u n )^(x)). (29) 

Fix x and ki = ki{x) as before and notice that for any y S E, lim,^ 00 (/i/ Ci (y) — h^-iiy)) = and from 
([26]) . — hfa-xWg < M. Applying Proposition 15.11 into (|29|) replacing n by ki and taking the limit 
as i — > oo yields that 

< -p£(x, G(x)) + L/(x, G(x)) + ffr(x, G(x)) + Gh{x, G(x)), (30) 
and from (|30l) we get that /i(x) < 72-(p, h)(x). Thus we have (|24p . □ 

5.2 Optimality of the PIA 

We present next a definition that will be useful for the next results. 
Definition 5.4 For any Q = (p,,fj,g) € V, define 

[e] t = {p(. + t),p d ). (3i) 
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Let us recall that the PDMP {X u (t)} and its associated components: X(t), Z(t), N(t), r(i) have 
been introduced in section 12.11 (see in particular equation ([I])). We need several auxiliary results 
(Propositions 15. 5\ 15.61 and Corollary I5.7|) to show that the PIA actually provides an optimal solution 
for the average cost problem of the PDMP X(t). 



Proposition 5.5 For y = (y, z, s,n) £ E and U = (u, uq) £ M(N x E x E + ; U) x M(N xE;V), define 
T u (n,z) = (u(n, z, .),ug(n, z)) £ V. For e £ (0, c) introduce 



w 



u 



(y) =cL- e f(y, [T u (n, z)] J + F_ e r(y, [T u (n, z)) s ) + G„ e g(y, [T u (n, z)] 



-bC- e (y, [T u (n,z)] s ), 
where c = c — e. Then for all x £ E, U £ we /iaue 



(*,o) 



UfvU 



X u (t)) < e -* t g(x) + -[l-e 



-etl 



(32) 



(33) 



Proof: For y = (y,z,s,n) G £ and [/ = (u,«a) £ M(N x£xR + ;U}x M(N x E;U), define 
/^(y) = /(y, u ( n , z, s)), 7^{y) = f(y, ua(n, z)), g{y) = g(y), and for t £ M.+ A u (y, t) = A u (x, n, t). 



It is easy to show that w € M.(E). Moreover, for y = (y,z,s,n) £ E and U = (u,uq) £ 
E x M + ;U) x M(N x E;U), satisfying [T u (n,z)] s £ V(y) we have by using Corollary 3.11 in [5j| with 
a = — e that 

cL_ e /(y, [^(n,*)],) + //- e r(y, [^(n, z)] g ) + GL e5 (y, [i^n, z)] s ) 

-bC^(y,[T u (n,z)] s )<g(y). (34) 



Moreover, from Remark 13.81 ii). 

< £_ e (y, [r u (n,z)] s ) < £_ c (y, [^(n,*)],) < K\. 
From now on, consider U = (u, ug) £ U. Notice that for any x = (x, x, 0,k) £ E 



(35) 



+e 



V 

es— A' y fc(x,s) 
et*(x)-A"fc (a:,t*(a:)) 



ur (x) = cL__J(x, T u (jfe, x)) + ff_ e r(y, T u (Jfe, z)) + G_ e5 (x, T u (k, x)) - b£^ t (x, T (k, x)) 

-b + cf(c/>(x, s), Vk(s)) + A(0(a;, s), u k (s))Qg(4>(x, s), v k (s)) 



Qg((f>(x,t*(x)),u 9 (k,x)) + r(0(x,t*(a;)),ua(fc,a:)) 



(36) 



with fk{-) = u(k,x, .). Since for all fc £ N, x £ E 1 , r a;) £ V(x), it follows from equation (|34f) that 



w u (x) < g{x). 

Moreover, since [r a (iV(t), Z(t))] £ Y(X(t)), the inequality ([35]) implies that 



(37) 



J m (t, x) :— E^ x k 



h) 



tAT„, 



c^(^(a))-6 



tAT„ 



e «f^(^( 8 _))dp*( a ) 



+ e «(*AT-)iI^(jf^(tAr m )) 
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is well defined for any x = (x, x, 0, k) G E. 

Let us show by induction on m G N that J^(t,x) < g{x) for all t G R + , x = (x,x,0, A;) G E. Clearly, 



we have that Jq {t, 



x) = w (x 



?). Consequently, from equation (J37|), we have that J^{t,x) < g(x) for 



all t G R+, x = (x,x,0, k) G E. Now assume that for m G N we have that J^(t, x) < <?(x) for all 
i G 1R+, x = (x,x,0, k) G E. Following the same arguments as in the proof of Proposition 4.3 in 
it is easy to show that for t G M+ 



t/\U(x) 



^es-K"k (x,s) 



-b + c/(0(x, s), 1^0)) + A(0(x, s), u k (s))Qg((p(x, s), v k (s)) 



+ ht^i,)}^'^ (XMX)) [Q9(<Kx, U {x)),u 9 (k, x)) + r(0(x, i*(x)), ^(fc, x)) 

+ W(,)} eet " A ^ (X '^ C/ (^' f ))- 

Now if t < i*(x), then by using the fact that 4>{x, t) = (</>(x, t),x, t, k) we get that 

w u Q{x, t)) = cL- e f(x, [T u (k, x)] t ) + H„ e r(x, [T u (k, x)] f ) + G_ e5 (x, [T u (k, x)] t ) 
-bC. t (x, [T u (k,x)] t ), 

and it follows, by applying Proposition 4.2 in [J], that 



ds 



(38) 



w u (x) 



es-K v k (x,s) 



-b + cf(cp(x, s), v k {s)) + A(>(x, s), u k (s))Qg((j)(x, s),u k (s)) 



ds 



+e 



et-A»k(x,t)~U 



w u (<t>(x,t)). 



(39) 



Therefore, combining equations ([38]) and (f39l) we get that J^ +1 (t,x) < w (x) and by using equation 
([37]) we have that J^ +1 (t,x) < g(x). 

If t > i*(x), then equations (j36|) and (f38|) yields J^(i,x) < ^(x). By using equations (|371) . we have 
Jm(t, x) < g(x), showing the fact that for all m G N, J„(t, x) < g(x) for all t G M+, x = (x, x, 0, fc) G E. 



Consequently, this implies that —bEV 



(*,0) 



tAT„ 



e es ds 



U.0 



^"^(^(t AT m ))l < g(x). 



Combining Fatou's Lemma and Remark 12.41 we obtain that 



- [- 



cf 



1 +e a E, 



et ipU 



(*,0) 



€f[X u (t))] <g(x), 



showing the result. 



(40) 

□ 



Proposition 5.6 For all x G E, U G Li, we have that e¥ 



(t, m)eR + xN and 



(x,0) 



w (X (t A T m )) exists in M + for any 



lim - lim e¥ q) 



W 



U {X u (tAT m )) 



0. 



(41) 



Proof: Clearly, we have 



E: 



(x,0) 



w 



iJ <X u (tAT m )) =< 0) I {t<Tm} w u [X u {€)) +EVl {t > Tm} uF[X u {T m )) 



and thus by using Remark 13.81 ii) , 



< E, 



u 

(x,0) 



w U {X u (tAT m )) 



< E 



u 

(x,0) 



w 



u 'X u {t)) +E" w u (X u (T m )) 



+ bK x . 



(42) 
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Iterating Assumption 13.61 we 

obtain that for all m G N, EY xfi) \w u (X u (T m ))\ < g(x) + - 



Combining equations (133)) . (|42p and the previous inequality, the result follows. 
Corollary 5.7 For a// U €.14, 



and 



lim - lim EV m 

i->+oo £ m-t-oo ^' u > 



lim — lim £7, "t. 

t— >+oo t m— >oo l x ) u ; 



h(X(iAT m )) 



h(X(tAT m )) 



<0, 



0. 



Proof: From equation (I25|) . it follows that for all x € E, T € V(x), 

- pC(x, u^x)) + Gh(x, u^x)) < h(x) < Lf(x, V) + Hr(x, T) + Gh(x, T). 
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(43) 



(44) 



(45) 



Consequently, by using Remark 13.81 it), the definition of w and Assumption 13.41 we obtain that there 
exist M\ > such that for any U €lA 

h(X(tAT m )) < Mi ^w U (X U (t A T m )) + bK\ 

Consequently, combining the previous equation and (|4*T|) we obtain equation (|43p. 
Moreover, notice that (N(t), Z(t))] , > = u^(X(i)) and so equation (|4*5j) implies 



?1> 



^ (X% (t A T m )) + bK\ - pK x < h(X(t A T m )) . 



By using equation (|4"T|) . this yields that lim - lim E 1 , 1 
previous inequality with (|4"3"|) . the result follows. 



a r m )) 



> 0. Combining the 

□ 



Finally, we can now present our second main result. It states that the measurable selector of the 
optimality equation (f24"|) associated to (p, h) gives an optimal feedback control Uq^ for the process 
{X(t)}. ' 

Theorem 5.8 The control Uq, is an optimal strategy for the long-run average control problem: 



for all x € E. 



Proof: From Proposition 15.61 we have that EV x0) \h(X(t A T m )) 



feo) 



defined. Therefore, following the same arguments as in Proposition 4.3 in [4| it can be shown that 



h(X(t A T m )) 



is well 



tAT„ 



tAT„. 



f(X( S ),u(N(s),Z(s),r( S )))ds+ / r(X( S -),u d (N( S ),X(s-)))dp*(s) 



h[X[t A T TO )) 



p[tAT m ] +h(x), 



where U = (u,uq) G IA. From equation (|43f) . it implies that 



lim -Ego) 



f{X(s),u(N(s),Z(s),r(s)))ds+ / r(X(s-), u 9 (N(s), X(s-)))dp*( 



> 
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showing that inf A(U, x) > p. 
U&A 

From equation (|44j) . it can be shown by using the same arguments as in the proof of Proposition 4.4 
0] that 



m 



lim —E, "fx 



f(X(s),u(X(s)))ds+ / r(X(s-),u(X(s-)))dp* 



< p — lim — lim E, 



h[X(t A T m )) 



implying that inf^.A(£/, a;) < p. 

Therefore, it follows that p = mf^A(U,x) = A{Uu^,x) for all x & E. 



□ 
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